Focused Crawls, Tunneling, and Digital Libraries
نویسندگان
چکیده
Crawling the Web to build collections of documents related to pre-specified topics became an active area of research during the late 1990’s, crawler technology having been developed for use by search engines. Now, Web crawling is being seriously considered as an important strategy for building large scale digital libraries. This paper covers some of the crawl technologies that might be exploited for collection building. For example, to make such collection-building crawls more effective, focused crawling was developed, in which the goal was to make a “best-first” crawl of the Web. We are using powerful crawler software to implement a focused crawl but use tunneling to overcome some of the limitations of a pure best-first approach. Tunneling has been described by others as not only prioritizing links from pages according to the page’s relevance score, but also estimating the value of each link and prioritizing them as well. We add to this mix by devising a tunneling focused crawling strategy which evaluates the current crawl direction on the fly to determine when to terminate a tunneling activity. Results indicate that a combination of focused crawling and tunneling could be an effective tool for building digital libraries.
منابع مشابه
شاخص های طراحی و ارزیابی کتابخانه های دیجیتالی
Introduction: There was always suspicion regarding concept and frameworks of digital libraries concepts such as electronic library, virtual library, without wall library, hybrid library and digital library have applied often together, or for each other for conveying library concept. Studies have shown that so far there is no standard and universal accepted definition for digital libraries, howe...
متن کاملContext-aware systems: concept, functions and applications in digital libraries
Background and Aim Among the places that context-aware systems and services would be very useful, are libraries. The purpose of this study is to achieve a coherent definition of context aware systems and applications, especially in digital libraries. Method: This was a review article that was conducted by using Library method by searching articles and e-books on websites and databases. Results:...
متن کاملProposed content framework for digital literacy education to users in Iran
Aim: today, digital literacy, as a set of skills that enable people to use digital space effectively for success in personal, educational and professional life, has become a necessity in all societies and public libraries are one of the most important providers of digital literacy education in the world. Digital literacy education has not been considered in public libraries in Iran. The first s...
متن کاملPerceptions of the Faculty Members of Social Science Groups about the Challenges of Using Digital Information Resources and Libraries: A Case Study of Islamic Azad University
متن کامل
Apprendre à ordonner la frontière de crawl pour le crawling orienté
Focused crawling consists in searching and retrieving a set of documents relevant to a specific domain of interest from the Web. Such crawlers prioritize their fetches by relying on a crawl frontier ordering strategy. In this article, we propose to learn this ordering strategy from annotated data using learning-to-rank algorithms. Such approach allows us to cope with tunneling and to integrate ...
متن کامل